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Bone image analysis and categorizing bone cancers have both seen 
advancements thanks to deep learning (DL), more notably convolution 
neural networks (CNN). This study suggests a brand-new CNN-based 
methodology for categorizing pelvic bone tumors specifically. This work 
aims to create a pelvic bone computed tomography (CT) image 
categorization system based on deep learning. The proposed technique uses 
a convolutional neural network (CNN) architecture to automatically extract 
information from the CT images and classify them into distinct categories of 
tumors. A total of 178 3D CT pictures was discovered and added 
retroactively. DenseNet created the image-based model with Adam 
optimizer and cross entropy loss. The suggested system's accuracy is 
assessed using a variety of performance indicators, including sensitivity, 
specificity, and Fl-score. As demonstrated by the experiment findings, the 
suggested deep learning based classification system has a high degree of 
accuracy (94%), making it useful for the diagnosis and treatment of pelvic 


bone tumors. Our promising results might hasten the use of DL-assisted CT 
diagnosis for pelvic bone tumors in the future. 
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1. INTRODUCTION 

Primary malignancies of the bone and joints are the third most prevalent cause of mortality for 
cancer patients under age 20 [1]. Different therapies are required depending on whether a bone tumor is 
classified as benign, intermediate, or malignant by the world health organization (WHO) [2]. Pelvic bone 
tumors come in a variety of types, and they are frequently treated differently. Understanding the form of a 
tumor is crucial for developing the appropriate treatment strategy. Bone tumors are generally benign and do 
not tend to spread. Although they can appear in any bone, the largest ones are often where they are located. 
These comprise the humerus (upper arm bone), tibia (shinbone), femur (thighbone), and pelvis. Certain kinds 
are more prevalent in particular regions, including the spine or the area around the greatest bones' 
development plates. Benign bone tumors are a broad category that includes many different forms of tumors. 
The most typical ones include osteochondromas, chondroblastomas, giant cell tumors, periosteal 
chondromas, chondroblastomas, osteoid osteomas, chondroblastomas, endochondromas, and chondromyxoid 
fibromas. Benign bone tumors include a number of diseases, including fibrous dysplasia, unicameral bone 
cysts, and cysts. Even though they aren't actually tumors, they are frequently treated similarly. Compared to 
benign tumors, malignant (cancerous) tumors are more harmful and uncommon. When a tumor is classified 
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as malignant, it indicates that there is a moderate to high chance that it may spread from its original site. The 
lymphatic or blood vessels are the routes via which the cancer cells spread. Most frequently, malignant bone 
cancers spread to neighboring bones or the lungs. Bone cancerous tumors can develop at nearly any age. Two 
of the most prevalent malignant bone tumors, osteosarcoma, and ewing's sarcoma, often affect individuals 30 
years of age or younger. On the other hand, malignant tumors called chondrosarcomas, which develop as 
cartilage-like tissue, typically appear beyond the age of thirty. Chondrosarcoma, chordoma, ewing's sarcoma, 
neuroblastoma, and osteosarcoma are examples of malignant bone tumors. Since radiographs usually 
evaluate the lesion's location, internal matrix, boundaries, and concurrent periosteal reaction, they are the best 
first-line imaging modality for evaluating bone lesions [3]. A differential diagnosis of bone lesions may 
frequently be made using these lesion characteristics and the patient's age [3], [4]. The issue is that 
radiographs have their limits. The radiographic diagnosis may be more difficult because of superimpositions, 
poorly seen partial cortical loss, and difficulties interpreting flat and short bones and soft tissues [5]. 
Radiologists have expressed interest in using contemporary computer-aided diagnostic (CAD) tools to save 
time and effort [6]. We use a set of CT scans because of their excellent spatial and contrast resolution. Using 
computed tomography (CT), it is possible to examine minute bone characteristics. Furthermore, these 
features enable accurate classification of minor lesions and assessment of the substitution of soft tissue for 
fatty marrow in metastatic lesions. In general, CT is more readily available and less expensive to study. The 
primary drawback of CT is that it exposes patients to more radiation than a bone scan would [7]. 

Many research studies have used the DL method in recent years for medicinal purposes [8], [9]. One 
advantage of such a method is an automatically applied feature learning model that can handle large data sets. 
Convolutional neural networks (CNN) are one DL approach that is thought to have the optimal architecture 
in many applications for classifying images. High performance in the classification, segmentation, and 
detection tasks of medical pictures is suggested by a CNN approach with many architectures, especially in a 
medical application [10]. An essential part of CNNs' conceptual design is down sampling space, weight 
sharing, and local perception area [11]. A lot of studies have been published recently that use different 
medical imaging to detect and classify bone tumors. Several DL algorithms are used in these experiments. In 
2018, 3D CT scans of the spine were performed on extremely sick patients to address the segmentation and 
classification of difficult-to-define lytic and sclerotic metastatic lesions. Here, a CNN provided independent 
feature extraction [7]. Additionally, in 2018, methods for automated abdominal anatomy segmentation 
utilizing CT images were developed, along with methods for diagnosis, treatment planning, and therapy 
delivery [12]. The segmentation approaches included statistical models and multi-atlas label fusion (MALF) 
[13]. A new CNN architecture was used to classify three types of brain cancers. It was compared to pre- 
existing, pre-trained networks and found to be simpler after being tested on T1l-weighted contrast- 
enhancedMRI. In 2020, two 10-fold cross-validation techniques, two datasets, and four methodologies were 
coupled to assess the network's performance [14]. Furthermore, the 2021 classification of pathologists’ 
histological bone tumor aggressiveness will be contrasted with that of VGG- and deep learning (DL). As part 
of whole slide imaging (WSI), 427 pathological slides of bone tumors were created. Pathologists annotated 
the WSI tumor area. Four pathologists with varying levels of expertise were contrasted with the most 
effective models [15]. Additionally, studies were done in 2021 to develop a deep learning system that uses 
patient demographics and regular magnetic resonance imaging (MRI) to discern between benign and 
malignant bone lesions. Using T1- and T2-weighted pre-operative MRI, 1,060 histologically verified bone 
lesions were identified and included retrospectively [16]. Furthermore, fusion models utilizing deep learning 
and machine learning were developed in 2022 to categorize bone cancers as benign, malignant, or 
intermediate utilizing the lesion's conventional radiography as well as possibly pertinent clinical information 
[17]. A proposal was made in 2021 to create an intelligent clinical decision support system that would use 
pictures from the risk of malignancy index (RMI) to diagnose and classify brain cancers. After pre-training 
on an ImageNet dataset that had been appropriately matched to MRI images of brain tumors extracted from 
the brain tumor segmentation (BraTS) 19 database, seven CNN architectures were chosen [18]. Deep 
convolutional neural networks (DCNNs) were used in 2021 for deep ensemble learning, which classified skin 
lesions using CNNs. Using an ensemble learning method increased the Fl-score performance, sensitivity, 
specificity, accuracy, and precision of three DCNN architectures: DenseNet 201, Inception V3, and Inception 
ResNet V2 [11]. Furthermore, a number of deep learning models were developed in 2021 to automatically 
classify brain cancers. To classify brain cancers, four deep learning models AlexNet, VGG16, GoogleNet, 
and RestNet50 were used [19]. In 2022, a deep neural network architecture based on the transfer learning 
theory was created in order to identify and categorize pictures of the histology of breast cancer. The 
researchers used three pre-trained CNN architectures (ResNet50, inception-v3, and VGG-16) inside the 
proposed framework to extract features from images and distinguish between benign and malignant tumor 
cells in the histopathology images of breast cancer. After that, they loaded their gathered characteristics into a 
fully connected (FC) layer by concatenating them [20]. A total of 3586 images of brain tumors both benign 
and malignant were utilized in 2022. CNN classified phase was built on top of ResNet50 architecture. To 
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prevent over-fitting, a global average pooling (GAP) layer was included in the design for the ResNet50 
output. Finally, a sigmoid layer was used to obtain the classification [6]. Additionally, in 2022, the study 
presented new mathematical modeling that uses effective processing to evaluate and identify the features of 
artificial pulse audio signals. During the training phase, which is mainly made possible by deep neural 
network-based learning, the numerous functional blocks of the numerical modeling are further connected 
with the recurrent structure of long-short-term memory (R-LSTM) feedback connections (FCs) (DNNL) [21]. 
Research that was recently carried out in 2023 focuses on the employment of an interval selection technique 
in conjunction with a heterogeneous client balancing strategy to classify EEG data using the ResNet50 deep 
architecture [22]. 

Because bone tumors can occur in a number of ways and are not common, few radiologists have the 
skills necessary to make an accurate diagnosis. Additionally, early tumor detection can greatly aid doctors in 
early management and determining the necessary treatment strategy. Thus, the goal of this work is to develop 
a deep learning-based system for classifying CT images of pelvic bones. The suggested technique 
automatically extracts information from 3D CT scans and classifies the images into several tumor types using 
a CNN architecture. A number of people's pelvic CT scans were used to train and evaluate the CNN model. 


2. METHOD 

This study proposes a novel 14-category classification scheme for pelvic bone tumors. The 
recommended deep learning method for pelvic bone malignancy detection is shown in Figure 1. The analysis 
starts with data pre-processing techniques. Pre-processing is done on images before they are added to the 
network. CNN classification is the next level in our pipeline. A thorough description of the data sets, CNN 
network hyper-parameter values, optimization method, training computations, and performance calculations 
can be found in the next subsection. 


Data 
c> Resizing o> 
Splitting 


3D pelvic bone Dataset Pre-processing 


DenseNet121 Construction 
and Parameters 
Optimization 


V 


One of 14 Classes 


Figure 1. Block schematic illustrating the suggested approach 


2.1. Dataset 

We used an online dataset found at https://zenodo.org/record/4588403#.YEyLq OzaCo on the 
zenodo website was established in the year 2021. This is a pelvic CT dataset that includes 75 CT scans with 
metal artifacts and 184 CT volumes (more than 320 K CT slices) from different manufacturers and fields [23] 
and the practice of multi-bone labeling is well known. Seven sources, including two clinics and five existing 
CT databases, provided the images [24]-[27]. These seven sub-datasets, which were independently gathered 
from multiple locations and sources, each contain unique characteristics that are frequently observed in 
clinics. In situations that are really poor quality or lack a pelvic region, the unconnected parts outside the 
pelvis are deleted. All images are in neuroimaging informatics technology initiative (NIfTI) format to 
simplify data processing using Python. Examples of the dataset with varied properties are shown in Figure 2. 
Table 1 presents a detailed summary of this dataset. 
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Figure 2. Images from the pelvic CT dataset examples in various contexts [23] 


Table 1. Summary of the pelvic CT dataset 


No. Dataset name Number of 3D volumes Mean spacing (mm) Mean size Year 
1 ABDOMEN 35 (0.76,0.76,3.80) (512,512,73) 2015 
2 COLONOG 731 (0.75,0.75,0.81) (512,512,323) 2008 
3 MSD_TIO 155 (0.77,0.77,4.55) (512,512,63) 2019 
4 KITS19 44 (0.82,0.82,1.25) (512,512,240) 2019 
5 CERVIX 41 (1.02,1.02,2.50) (512,512,102) 2015 
6 CLINIC 103 (0.85,0.85,0.80) (512,512,345) 2020 
7 CLINIC-metal 75 (0.83,0.83,0.80) (512,512,334) 2020 
Total pelvic CT dataset 1,184 (0.78,0.78,1.46) (512,512,273) 8-2-2021 
Our dataset (CTs) 178 (0.85,0.85,0.80) (512,512,345) 2022 


The dataset has seven sub-datasets. These two sub-datasets, CLINIC and CLINIC-meta, were 
acquired from an orthopedic hospital and are pertinent to pelvic fractures. While CLINIC-metal is acquired 
from photos taken after surgery that do, CLINIC is gathered from images taken before surgery that do not. 
This group of data was generated by the KITS19 kidney and renal tumor segmentation challenge [24]. The 
CERVIX and ABDOMEN sub-datasets were produced as a result of the workshop and challenge for 
multi-atlas labeling outside of the cranial vault. These multi-organ segmentation datasets are all original, 
one-of-a-kind datasets for various body parts. Colon tumors are segmented in the ninth sub-dataset of the 
medical segmentation decathlon, or MSD T10 [25], [26]. COLONOG: the CT COLONOGRAPHY dataset's 
sub-dataset, focuses on CT colonography research [27]. We only focus on the 178 CTs that contain tumors in 
the pelvic bones. This dataset contains both benign and malignant pelvic bone cancers. 

Our dataset is 178 3D CT images which is CLINIC and CLINIC-meta sub-datasets only because 
those only has the CT image with the tumors. All rest sub-datasets are labeling of the pelvic bones. 


2.2. Data pre-processing 

To reduce the total processing time for training and testing, a pre-processing step is first performed 
on the original images. Pre-processing is the term for a few basic abstraction-level operations on images. It 
explains every change done to the raw data before the DL model is fed into it. It is applied to improve the 
image and remove any prior incorrect information. Initially, every CT scan was adjusted to have a mean 
spacing of (0.85, 0.85, 0.80) mm with a mean size of (512, 512, 345) mm. Since the accuracy was greater 
while evaluating the performance with only 178 images, there is no requirement for data augmentation. Next, 
divide the data into two categories: training (80%) and testing (20%). Then, in order to speed up processing, 
resize the images one more time to the mean size of (50, 50, 50) mm. 

Pelvic bone tumors have 14 classes depending on their types. The tumor is classified into its type 
which is very important to early detect and diagnose the tumor. The many forms of benign and malignant 
pelvic bone tumors are listed in Table 2. The data collection includes tumor kinds that are both benign and 
malignant. The specific image dataset numbers are displayed in Table 3. 
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Table 2. Different kinds of pelvic bone tumors, benign and malignant [28]-[34] 


Benign Malignant 
a.  Osteochondroma a. Osteosarcoma 
b. “Eosinophilic granuloma” (EG) b. Chondrosarcoma 
c. Periosteal chondroma or juxtacortical chondroma c. Ewing’s sarcoma (ESFTs) “ewing sarcoma family of tumors” 
d. “Chondromyxoid fibroma” (CMF) d. Chordoma 
e. Desmoplastic fibroma or collagenous fibroma e. Lymphoma 
f. Benign fibrous histiocytoma f. | Metastatic bone carcinoma 
g. Angiosarcoma 
h. | Hemangiopericytoma 


Table 3. Details of the dataset of images 


Number Tumour Images number 
1 Osteosarcoma 53 
2 Chondrosarcoma 33 
3 Ewing’s sarcoma (ESFTs) 18 
4 Chordoma 11 
5 Lymphoma 10 
6 Metastatic bone carcinoma 3 
7 Angiosarcoma 2 
8 Hemangiopericytoma 2 
9 Osteochondroma 16 
10 “EG” 12 
11 Periosteal chondroma or juxtacortical chondroma 11 
12 “CMF” 3 
13 Desmoplastic fibroma or collagenous fibroma 2 
14 Benign fibrous histiocytoma 2 

Total 178 


2.3. DenseNet121 architecture 

The training model is built using the dense convolutional network (DenseNet) [35]. Figure 3 shows 
the construction of DenseNet. Each layer uses concatenated feature maps from all previous layers as inputs 
rather than average feature maps. DenseNets need fewer parameters than a similar regular CNN, allowing for 
feature reuse because redundant feature mappings are eliminated. Consequently, the feature maps of all 
layers preceding the lth layer serve as its input, Xo,...,X1—1: 


xı = H Ny Mig e e eee Mica) (1) 


where [Xo,...,X,_1] represents the concatenation of feature maps, or the output from all layers before 1 (0, 1-1). 
A rectified linear unit (ReLU) [36], a batch normalization (BN) [37] and a 3x3x3 convolution (conv) are the 
three operations that make up the composite function H, (). To make implementation simpler, H,'s many 
inputs were combined into a single tensor. 
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Figure 3. Block schematic of the suggested DenseNet architecture 
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The feature map gets larger after going through each thick layer because each layer adds ‘k' features 
on top of the global state, or features that were already there. The rate of network expansion, denoted by 
parameter 'k,' dictates the amount of data that gets added to every network tier. If every function H, produces 
k feature maps (where Kọ is the number of channels in the input layer), then the Ith layer comprises input 
feature maps. 


kı =k tk*(Ul-D (2) 


Even though each layer only produces k output feature maps, a substantial amount of input may be 
required, particularly for levels that come after. To increase processing speed and efficiency, a (1x1x1) 
convolution layer may be added as a bottleneck layer before each (3x3x3) convolution [38], [39]. 
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DenseNet-121 has 6 AvgPools and 13 convolutions illustrated in Table 4. Each layer in the 
DenseNet121 design receives all of the outputs from the layers that came before it, creating an extremely 
dense network architecture that enables deep supervision. Each convolution block in our suggested model 
employs a skip relation to get around the problem of vanishing gradients. Lastly, the output layer of the 
original ResNet50 has been replaced with GAP. A (MxMxN) feature map, in which (MxM) represents the 
picture size and N the number of filters, is transformed into a (1xN) feature map via the GAP layer. 
Convolutional blocks are used for extracting features. The following are additional advantages for the GAP 
layer: At this layer, over-fitting is prevented because parameter modification is not required. It functions as a 
flattened layer, reducing a multidimensional input vector of extracted characteristics to a single dimension. 
Additionally, it requires less time. 


Table 4. DenseNet121 configuration 


Layers Output Size DenseNet-121 
Convolution 50*50*50 3*3*1 conv, stride 3 
Pooling 50*50*50 3*3*3 max pool, stride 3 
Dense block (1) 50*50*50 E *1 conv «6 

3*3 conv 


Transition layer (1) 50*50*50 1*1*1 conv 
25*25*25 2*2*2 average pool, stride 3 
Dense block (2) 25*25*25 3 *1 foe 12 
3*3 conv 
Transition layer (2) 25*25*25 1*1*1 conv 
12*12*12 2*2*2 average pool, stride 3 
Dense block (3) 25*25*25 k *1 son] x24 
3*3 conv 
Transition layer (3) 25*25*25 1*1*1 conv 
6*6*6 2*2*2 average pool, stride 3 


Dense block (4) 6*6*6 È *1 con] «16 
3*3 conv 
Transition layer (4) 6*6*6 1*1*1 conv 
3*3*3 2*2*2 average pool, stride 3 
Classification layer 1*1*1 3*3*3 global average pool 


1000D fully-connected, softmax 


The 'vanishing gradient’ problem manifests when the CNN's layer count rises or as the layers get 
deeper. It follows that when the distance between the input and output layers lengthens, some information 
may ‘vanish’ or 'get lost,’ which has an impact on how well the network can train. DenseNets addresses this 
problem by changing the conventional CNN architecture and minimizing the link between layers. Each layer 
has an identical feature map size and is physically connected to the others. To keep the feed-forward nature 
intact, each layer broadcasts its own feature maps to all upper levels and accepts additional inputs from all 
previous layers. This dense connection design has the somewhat surprising benefit of requiring fewer 
parameters than conventional convolutional networks since duplicate feature maps do not need to be 
relearned. Every feature mapping in the network is used to inform the final classifier's decision. Because 
DenseNet layers are very tiny (e.g., 12 filters per layer), only a small fraction of feature maps contribute to 
the network's "collective knowledge" while the remaining feature maps remain unchanged. Among the many 
advantages of DenseNets are the reduction of vanishing gradient problems, improved feature propagation, 
ease of feature reuse, and a significant decrease in the number of parameters [35]. 


2.4. Training 

We have used Python (v3.7), which used DenseNet121 and the help of MONAI [40]. Models were 
trained in PyTorch (v1.6) and Python (v3.7) using an AMD Ryzen 7 5800H graphics processing unit (GPU) 
running at 3.2 GHz. In the model, the numerical index labeling method is used. Assign each tumor class to a 
label index. The study has 14 labels. The networks are trained by defined transforms like resizing the image 
and enabling randrotate90 to detect any rotation of the image. The batch size is 60 for 100 epochs, 
respectively. Because of GPU limitations, the starting learning rate is set to 0.1 and divided by 10 at epoch 
90. memory constraints. Furthermore, the DenseNet was created with spatial dimensions 3, input channels 1, 
and out channels=14 and enabled the cross-entropy loss [41] and Adam optimizer [42]. Finally, evaluate the 
performance metric. 


2.4.1. Cross entropy loss 

Let S stand for the sample space and L for the finite number of labels in problems requiring 
multi-class classification, where L={ l4; l2;:: ; lm}; m > 2. The mapping relationship between sample x and label 
set L is many-to-one as each sample may only have one label, but several samples may share the same label. A 
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multi-class classification challenge involves determining a sample's class based on many classes. The connection 
weight between the it” and jt” neurons is shown by the symbol w; j. Following is a display of the softmax. 


exp (zi) 
(= XP) 3 
Vem sa a (3) 
where i € {1,2,............,m}, and the neural output Zj = Xi WijXij. 
The cross-entropy function is. 


FEW) = -X2 J log(softmax(w;x)) = — LM, F log(y:) (4) 


where m denotes the total number of classes, y; denotes the it” prediction class of MPCE, and J; denotes the 
it” true class of training samples. 


2.4.2. Adam optimizer 

Adam optimization is a stochastic gradient descent method that relies on an adaptive estimate of 
first- and second-order moments. Adam is an optimization technique that, in contrast to the traditional 
stochastic gradient descent method, may be used to continuously update network weights based on training 
data. In their 2015 ICLR paper, Diederik Kingma from OpenAI and Jimmy Ba from the University of 
Toronto made Adam's initial presentation. It is suitable for issues requiring large amounts of data and/or 
parameters because of its computational efficiency, low memory needs, invariance to diagonal rescaling of 
the gradients, and low memory requirements. The next algorithm shows the Adam optimization method: 


Algorithm: Adam, our proposed algorithm for stochastic optimization. 
g? indicates the elementwise square ge © gt- 
Good default setting for the tested machine learning problems is a = 0.001. 
bı = 0.9, B, = 0.99 and e = 1078. All operations on vectors are element-wise. 
With B! and Bf we denote f, and £, to the power t. 
Require: a: Stepsize 
Require: £4, £2 E [0,1): Exponential decay rates for the moment estimates 
Require: f(@): Stochastic objective function with parameters 0 
Require: 9 : Initial parameter vector 
My © 0 (Initialize 15‘ moment vector) 
vo < 0 (Initialize 2" moment vector) 
t — 0 (Initialize timestep) 
while 6, not converged do 
te-t+1 
gt — Vo ft(0t-1) (Get gradients w.r.t stochastic objective at timestep f) 
m, — P1:-Me-1 + (1 — b1). ge (Update biased first moment estimate) 
ve — Bo. Vt-1 + (1 — b2). g? (Update biased second raw moment estimate) 
m, — m,/(1 — Bf) (Compute bias-corrected first moment estimate) 
C, — v,/(1 — BS) (Compute bias-corrected second raw moment estimate) 
0: — 1 — a. m/l + €) (Update parameters) 
end while 
return 0, (Resulting parameters) 


3. RESULTS AND DISCUSSION 

Because of having multi-class classification (14 pelvic bone tumors), using multi-label confusion 
matrix (MLCM) [43] to calculate the accuracy of the classification technique. By measuring the classification 
overlap, the confusion matrix is an effective tool for performance evaluation. A two-dimensional matrix 
called a confusion matrix has rows that show the real labels and columns that show the classifier's anticipated 
labels. It is easy and simple to create a confusion matrix for a multi-class classifier. 


M(r,c) = m (Ii = r)I(h(x;) = c)), All r,c e{0,...,q — 1} (5) 


where M is the confusion matrix, r and c are the row and column of the confusion matrix, m is the number of 
instances in the test data set, J( ) is the indicator function, x; is the i-th input to classifier h( ), y; is the true 
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label assigned to input x;, and q is the number of classes. The next algorithm shows the method for creating 
the confusion matrix for a multi-class classifier. 


Algorithm multi-class confusion matrix 
for each input instance do 
r = assigned label 
c = predicted lable 
M(r,c)+=1 
end for 


Through the use of a confusion matrix, the distribution of guesses across all classes is displayed in 
an understandable and concise manner. After normalizing the confusion matrix row by row, the percentage of 
FN for every class that corresponds to each row is obtained. The cells along the main diagonal of the 
normalized confusion matrix show the recall for the relevant class. In the event of imbalanced data sets, the 
confusion matrix with the real number of counting the true and false prediction will provide information on 
the size of each class, while the normalized matrix is useful for figuring out the percentage of the true and 
false prediction. The ability to see and examine the distribution and overlap of accurately predicted labels 
over other labels in a single view is the primary benefit of having a confusion matrix. In order to determine 
accuracy, recall, and Fl-score, the most important and commonly used metrics in classifier evaluation, it is 
also utilized to compute TP, TN, FP, and FN. The resulting multi-class confusion matrix is displayed in 
Table 5. The accuracy, recall, and Fl-score values for every class are shown in Table 6. These may be 
computed in the following way for every class: 


TP 


MPrecision = (6) 
TP+FP 
Recall = —~ (7) 
TP+FN 
F1 — score = —— =" _ (8) 
TP+TN+FP+FN 
Table 5. Resulted confusion matrix 
Predicted classes 
Classes CO Cl C2 C3 C4 C5 C6 C7 C8 C9 C10 Cll C12 C13 
co 11 0 0 0 0 0 0 0 0 0 0 0 0 0 
Cl 0 9 0 0 0 0 0 0 1 0 0 0 0 0 
C2 0 0 3 0 0 0 0 0 0 0 0 0 0 1 
C3 0 0 0 53 0 1 0 0 0 0 0 0 0 0 
$ C4 0 1 0 0 10 0 0 0 0 0 0 0 0 0 
E C5 0 0 0 0 0 3 0 0 1 0 0 0 0 0 
T C6 0 0 0 0 1 0 33 0 0 0 1 0 0 0 
E C7 0 0 0 0 0 0 0 2 0 0 0 0 0 0 
C8 0 0 0 1 0 0 0 0 14 0 0 0 0 0 
C9 1 0 0 0 0 0 0 0 0 17 0 0 0 0 
C10 0 0 0 0 0 0 0 0 0 0 10 0 0 0 
Cll 1 0 0 0 0 0 0 1 0 0 0 3 0 0 
C12 0 0 0 0 0 0 0 0 0 0 0 0 2 0 
C13 0 0 0 0 0 0 1 0 0 1 0 0 0 6 


Table 6. The results of each measurement for each class 
Class Precision Recall Fl-score 
(0) 0.99 0.917 0.995 


Cl 0.9 0.9 0.989 
C2 0.75 0.99 0.99 
C3 0.98 0.98 0.99 
C4 0.99 0.91 0.995 
CS 0.75 0.99 0.995 
C6 0.97 0.97 0.995 
C7 0.99 0.667 0.99 
C8 0.99 0.99 0.984 
C9 0.99 0.944 0.99 


C10 0.99 0.99 0.957 
Cll 0.99 0.99 0.989 
C12 0.99 0.99 0.989 
C13 0.99 0.99 0.984 
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Despite its name, the multilabel_confusion_matrix function in the sklearn Python package [44] is a 
one-versus-rest confusion matrix (much like the binary confusion matrix). This means that TP, TN, FP, and 
FN are determined for each class using the multilabel_confusion_matrix function in sklearn. 

One of the measures most commonly employed in multi-class classification is accuracy, which is 
computed directly from the confusion matrix. 


Total TP 


Total Accuracy = ———_—_——_ 
Total TP+Total FP 


(9) 


The accuracy formula considers the total number of true positive and true negative components in 
the numerator as well as the total number of entries in the confusion matrix in the denominator. The items on 
the confusion matrix's major diagonal that the model properly recognized are known as the true positives and 
true negatives, while everything outside of the major diagonal that the model incorrectly categorized is 
contained in the denominator. Table 5 of the confusion matrix displays the 94% accuracy of the model. 
While it enhances some classes’ performance and is less sensitive to imbalanced data, relying only on it 
might be deceptive [45]. As a result, we also used f1 score, accuracy, and recall metrics. Because of this, we 
ought to have a reliable indicator of model generalization across the dataset of imbalanced CT images. Since 
later metrics are affected by class inequality, they are used to show the model's overall performance 
independent of the number of each particular class. 

The confusion matrix shows the network’s predictions for 178 images in each category. If all values 
on the diagonal were 178, this would indicate that each test image was correctly classified. Clearly, for our 
network, this is not the case. The values outside the diagonal give a sense of which category is getting 
misclassified. If there are more images, the accuracy will be 100%. 

Positive predictive value, or precision, is a metric that expresses the percentage of positive class 
predictions that really fall into the positive class. For most of the classes’ precision is higher than 0.9 which is 
illustrated in Table 6 except for class number two, which means the classifier is correct by more than 90% to 
indicate the correct type of tumor. The number of accurate class predictions made from all of the positive 
examples in the dataset is measured by recall, sometimes referred to as sensitivity. In our method, the recall 
value for most of the classes is more than 90% except for class number seven, which means the model is 
correct by more than 90% shown in Table 6. Sometimes the two metrics are combined into the Fl-score 
(or f-measure), which offers a single measurement for a system. It provides a single score that resolves the 
memory and accuracy problems into a single figure. For every class, we attain an Fl-score of greater than 
90%. 

Because of their shorter connections, DenseNet may be more accurate in part because each layer 
receives more supervision from the loss function. One may say that DenseNets performs a form of "deep 
supervision." Deep-supervised networks function by promoting the intermediate layers that accumulate 
discriminative features [46], which incorporate classifiers coupled to every hidden layer, have previously 
proved the benefits of deep supervision. By using a maximum of two or three transition layers, a single 
classifier placed on top of the network provides direct supervision to all levels. In an implied approach, 
DenseNets carry out a similar deep supervisory function in this way. Since the same loss function is used for 
all layers in DenseNets, the gradient and loss function are easier to understand. 

Even though the single DenseNet121 with a fine-tuning model has fewer parameters than many 
CNN networks, it still produces extremely good results. DenseNet 121's performance in this experiment 
demonstrates that it can be trained on many datasets display the classification results using the ensemble 
learning approach, with an average accuracy of 94%. In addition, the use of DenseNet overcomes the 
vanishing gradient problem. Unfortunately, this study has a limitation such as a lack of pelvic bone datasets 
but with these few medical images, the model obtains very good results. 

In addition, a review of previous studies comparing different approaches to classifying different 
types of bone lesions was done to assess the approach proposed in this publication. Table 7 reports and 
compares the new methodology approach's outcomes with the previous method on a range of bone lesions 
using a variety of medical images. The suggested ensemble model generates more accurate results than prior 
research, as demonstrated in Table 7. Because the ensemble model has a more complex architecture, the 
performance outcomes are evaluated during the training, validation, and testing phase to ensure higher 
performance. Table 7 compares our research to various techniques that have already been published. The 
table shows for MRI the maximum accuracy was 86% at year 2021 [17]. But when using WSI the accuracy 
was 85.96% [15]. However, our method using 3D CT images gives higher result (94%). 
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Table 7. Accuracy of published methods 


Method Year Medical image type Accuracy (%) 
Our method 2023 3D CT 94 
Liu et al. [17] 2022 MRI 86 
Eweje et al. [16] 2021 MRI 76 
Tao et al. [15] 2021 WSI 85.96 
Chmelik et al. [7] _2018 3D CT 80 


4. CONCLUSION 

The large intra-class variations and inter-class similarities in tumor size, location, and appearance 
make pelvic bone tumor diagnosis extremely challenging. In summary, this study demonstrates that 94% of 
the model's ultimate accuracy is achieved after training. We can attain a classification performance that is 
significantly greater than the results of prior studies based on the suggested ensemble model. The 
experimental findings show that the suggested frameworks produce effective results. Addition, using of 
DenseNet solves the vanishing gradient problem. Deep learning is more effective than young radiologists in 
classifying pelvic bone tumors from CT scans with similar accuracy as subspecialists. Our findings are 
encouraging and might hasten the use of DL-assisted pelvic bone tumor diagnosis in the future. 
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